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The two possible values of the 
chromatic number of a random graph 

By DlMITRIS ACHLIOPTAS and ASSAF NAOR* 

Abstract 

Given d G (0,oo) let kd be the smallest integer k such that d < 2k\ogk. 
We prove that the chromatic number of a random graph G(n, d/n) is either kd 
or kd + 1 almost surely. 

1. Introduction 

The classical model of random graphs, in which each possible edge on n 
vertices is chosen independently with probability p, is denoted by G(n,p). This 
model, introduced by Erdos and Renyi in 1960, has been studied intensively 
in the past four decades. We refer to the books [3], [5], [11] and the references 
therein for accounts of many remarkable results on random graphs, as well as 
for their connections to various areas of mathematics. In the present paper 
we consider random graphs of bounded average degree, i.e., p = d/n for some 
fixed d G (0, oo). 

One of the most important invariants of a graph G is its chromatic number 
x(G), namely the minimum number of colors required to color its vertices so 
that no pair of adjacent vertices has the same color. Since the mid-1970s, work 
on x(G(n,p)) has been in the forefront of random graph theory, motivating 
some of the field's most significant developments. Indeed, one of the most 
fascinating facts known [13] about random graphs is that for every d G (0, oo) 
there exists an integer kd such that almost surely x(G(n,d/n)) is either kd or 
kd + 1. The value of kd itself, nevertheless, remained a mystery. 

To date, the best known [12] estimate for x(G(n,d/n)) confines it to an 
interval of length about d ■ 2 ^ 1 1 °| 1 ^ g 2 rf . In our main result we reduce this length 
to 2. Specifically, we prove 

Theorem 1. Given d G (0, oo), let kd be the smallest integer k such that 
d < 2/clog k. With probability that tends to 1 as n — > oo, 

X (G(n,d/n)) G {k d ,k d + l} . 



*Work performed while the first author was at Microsoft Research. 
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Indeed, we determine x(G(n,d/n)) exactly for roughly half of all d £ 
(0,oo). 

Theorem 2. Ifde [(2k— 1) log k, 2k log k), then with probability that tends 
to 1 as n — > oo, 

x(G(n,d/n)) = fc + l. 

The first questions regarding the chromatic number of G(n,d/n) were 
raised in the original Erdos-Renyi paper [8] from 1960. It was only until the 
1990's, though, that any progress was made on the problem. Specifically, by 
the mid 1970s, the expected value of x{G(n, p)) was known up to a factor of two 
for the case of fixed p, due to the work of Bollobas and Erdos [6] and Grimmett 
and McDiarmid [10]. This gap remained in place for another decade until, in a 
celebrated paper, Bollobas [4] proved that for every constant p £ (0, 1), almost 
surely x(G(n,p)) = 21 " gw log (jh^j (1 + Luczak [12] later extended this 

result to all p > do/n, where do is a universal constant. 

Questions regarding the concentration of the chromatic number were first 
examined in a seminal paper of Shamir and Spencer [14] in the mid-80s. They 
showed that x (G(n,p)) is concentrated in an interval of length 0(^/n) for all 
p and on an interval of length 5 for p < n~ l ^~ £ . Luczak [13] showed that, for 
p < n~ x l^~ e the chromatic number is, in fact, concentrated on an interval of 
length 2. Finally, Alon and Krivelevich [2] extended 2-value concentration to 
all p < n~ x l 2 ~ £ . 

The Shamir-Spencer theorem mentioned above was based on analyzing the 
so-called vertex exposure martingale. Indeed, this was the first use of martin- 
gale methods in random graph theory. Later, a much more refined martingale 
argument was the key step in Bollobas' evaluation of the asymptotic value of 
x(G(n,p)). This influential line of reasoning has fuelled many developments in 
probabilistic combinatorics — in particular all the results mentioned above [12], 
[13], [2] rely on martingale techniques. 

Our proof of Theorem 1 is largely analytic, breaking with more tradi- 
tional combinatorial arguments. The starting point for our approach is re- 
cent progress on the theory of sharp thresholds. Specifically, using Fourier- 
analytic arguments, Friedgut [9] has obtained a deep criterion for the existence 
of sharp thresholds for random graph properties. Using Friedgut's theorem, 
Achlioptas and Friedgut [1] proved that the probability that G(n, d/n) is 
fc-colorable drops from almost 1 to almost as d crosses an interval whose 
length tends to with n. Thus, in order to prove that G(n, d/n) is almost surely 
A;-colorable it suffices to prove that lim inf n ^oo Pr[G(n, d'/n) is /c-colorable] 
> 0, for some d' > d. To do that we use the second moment method, which is 
based on the following special case of the Paley-Zygmund inequality: for any 
nonnegative random variable X, Pi[X > 0] > (EX) 2 /EX 2 . 



THE CHROMATIC NUMBER OF A RANDOM GRAPH 



1337 



Specifically, the number of fc-colorings of a random graph is the sum, 
over all ^-partitions a of its vertices (into k "color classes"), of the indicator 
that a is a valid coloring. To estimate the second moment of the number of 
/c-colorings we thus need to understand the correlation between these indica- 
tors. It turns out that this correlation is determined by k 2 parameters: given 
two ^-partitions a and r, the probability that both of them are valid colorings 
is determined by the number of vertices that receive color i in a and color j 
in r, where 1 < i,j < k. 

In typical second moment arguments, the main task lies in using proba- 
bilistic and combinatorial reasoning to construct a random variable for which 
correlations can be controlled. We achieve this here by focusing on the num- 
ber, Z, of /c-colorings in which all color classes have exactly the same size. 
However, we face an additional difficulty, of an entirely different nature: the 
correlation parameter is inherently high dimensional. As a result, estimating 
EZ 2 reduces to a certain entropy-energy inequality over k x k doubly stochas- 
tic matrices and, thus, our argument shifts to the analysis of an optimization 
problem over the Birkhoff polytope. Using geometric and analytic ideas we 
establish the desired inequality as a particular case of a general optimization 
principle that we formulate (Theorem 9). We believe that this principle will 
find further applications, for example in probability and statistical physics, as 
moment estimates are often characterized by similar trade-offs. 



We will say that a sequence of events £ n occurs with high probabil- 
ity (w.h.p.) if linin^oo Pr[£„] = 1 and with uniformly positive probability 
(w.u.p.p.) if liminfn-joo Pr[£ n ] > 0. Throughout, we will consider k to be ar- 
bitrarily large but fixed, while n tends to infinity. In particular, all asymptotic 
notation is with respect to n — ► oo. 

To prove Theorems 1 and 2 it will be convenient to introduce a slightly 
different model of random graphs. Let G(n, m) denote a random (multi)graph 
on n vertices with precisely m edges, each edge formed by joining two ver- 
tices selected uniformly, independently, and with replacement. The following 
elementary argument was first suggested by Luc Devroye (see [7]). 

Lemma 3. Define 



If c > Uk, then a random graph G(n,m = cn) is w.h.p. non-k- colorable. 



2. Preliminaries 
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Proof. Let Y be the number of fc-colorings of a random graph G(n,m). 
By Markov's inequality, Pr[Y > 0] < E[Y] < k n (1 - l/k) m since, in any fixed 
£:-partition a random edge is monochromatic with probability at least 1/k. For 
c > Uk, we have k(l — l/k) c < 1, implying E[Y] — >• 0. □ 

Define 

Ck = k log A; . 

We will prove 

Proposition 4. If c < C£_i, then a random graph G(kn,m = ckn) is 
w.u.p.p. k-colorable. 

Finally, as mentioned in the introduction, we will use the following result 
of [1]. 

Theorem 5 (Achlioptas and Friedgut [1]). Fix d* > d > 0. IfG(n,d*/n) 
is k-colorable w.u.p.p. then G(n,d/n) is k-colorable w.h.p. 

We now prove Theorems 1 and 2 given Proposition 4. 

Proof of Theorems 1 and 2. A random graph G(n, m) may contain some 
loops and multiple edges. Writing q = q(G(n,m)) for the number of such 
blemishes we see that their removal results in a graph on n vertices whose 
edge set is uniformly random among all edge sets of size m — q. Moreover, 
note that if m < cn for some constant c, then w.h.p. q = o(n). Finally, 
note that the edge-set of a random graph G(n,p = 2c/n) is uniformly random 
conditional on its size, and that w.h.p. this size is in the range cn±n 2 / 3 . Thus, 
if A is any monotone decreasing property that holds with probability at least 
6 > in G(n, m = cn), then A must hold with probability at least 9 — o(l) in 
G(n,d/n) for any constant d < 2c and similarly, for increasing properties and 
d > 2c. Therefore, Lemma 3 implies that G(n,d/n) is w.h.p. non-/c-colorable 
for d>{2k- \)\ogk > 2u k . 

To prove both theorems it thus suffices to prove that G(n, d/n) is w.h.p. 
fe-colorable if d < 2c k -i. Let n' be the smallest multiple of k greater than n. 
Clearly, if fe-colorability holds with probability in G(n',d/n') then it must 
hold with probability at least 9 in G(t, d/n 1 ) for all t < n' . Moreover, for n < 
t < n' , d/n' = (1 — o(l))d/t . Thus, if G(kn',m = ckn') is fc-colorable w.u.p.p., 
then G(n,d/n) is fc-colorable w.u.p.p. for all d < 2c. Invoking Proposition 4 
and Theorem 5 we thus conclude that G(n,d/n) is w.h.p. /c-colorable for all 
d<2c fe _i. □ 

In the next section we reduce the proof of Proposition 4 to an analytic 
inequality, which we then prove in the remaining sections. 
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3. The second moment method and stochastic matrices 



In the following we will only consider random graphs G(n, m = cn) where 
n is a multiple of k and c > is a constant. We will say that a partition of 
n vertices into k parts is balanced if each part contains precisely n/k vertices. 
Let Z be the number of balanced /c-colorings. Observe that each balanced 
partition is a valid fc-coloring with probability (1 — l/k) m . Thus, by Stirling's 
approximation, 



(1) 



EZ 



\{n/k)\ 



life 



1 



1 



(fc-l)/2 



n 



k 1 



Observe that the probability that a /c-partition is a valid A;-coloring is maxi- 
mized when the partition is balanced. Thus, focusing on balanced partitions 
reduces the number of colorings considered by only a polynomial factor, while 
significantly simplifying calculations. We will show that EZ 2 < C • (EZ) 2 for 
some C = C(k,c) < oo. By (1) this reduces to proving 

c-i 2n 



EZ 2 = O 



1 



n 



k-i 



k 1 - - 



This will conclude the proof of Proposition 4 since Pr[Z > 0] > (EZ) 2 /EZ 2 . 

Since Z is the sum of n\/[(n/k)\] k indicator variables, one for each bal- 
anced partition, we see that to calculate EZ 2 it suffices to consider all pairs of 
balanced partitions and, for each pair, bound the probability that both parti- 
tions are valid colorings. For any fixed pair of partitions a and r, since edges 
are chosen independently, this probability is the mth power of the probability 
that a random edge is bichromatic in both a and r. If lij is the number of 
vertices with color i in a and color j in r, this single-edge probability is 



Observe that the second term above is independent of the tij only because a 
and r are balanced. 

Denote by V the set of all k x k matrices L = (l^) of nonnegative integers 
such that the sum of each row and each column is n/k. For any such matrix L 
observe that there are n! / ■ £ijl) corresponding pairs of balanced partitions. 
Therefore, 



(2) 



EZ' 



E 



tev nf=i rij=i 



i=l j=l 



To get a feel for the sum in (2) observe that the term corresponding to 
= n/k 2 for all alone, is ©(n"^ 2 " 1 )/ 2 ) • [k(l - l/k) c } 2n . In fact, the 
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terms corresponding to matrices for which 1^ = n/k 2 ± 0{y/n) already sum to 
&((EZ) 2 ). To establish EZ 2 = 0{{EZ) 2 ) we will show that for c < c fe _i the 
terms in the sum (2) decay exponentially in their distance from = (n/k 2 ) 
and apply Lemma 6 below. This lemma is a variant of the classical Laplace 
method of asymptotic analysis in the case of the Birkhoff polytope i.e., the 
set of all k x k doubly stochastic matrices. For a matrix A G £>& we denote by 
PA the square of its 2-norm, i.e. pa = J2ij a "ij = ll^-lll- Moreover, let Ti(A) 
denote the entropy of A, which is defined as 

j k k 

(3) H{A) = - - ai i log ai i ■ 

k i=i ,=i 

Finally, let £ £>& be the constant \ matrix. 

Lemma 6. Assume that ip : B k — > R and /? > are suc/i i/iai /or ever?/ 
A G B fc , 

W(A) + 95(A) < H(J k ) + v>(J fc ) - P(pa - 1) • 
TTien there exists a constant C = C((3, k) > such that 

z, e x> lli=i LLj=i l i3- 1 \ /J 
The proof of Lemma 6 is presented in Section 6. 

Let Sk denote the set of all k x k row-stochastic matrices. For A £ Sk 
define 

k k / 2 ^ fc fc \ 

^) = - E E ^ lo § a ^ + clo § 1 - 1 + ^2 E E 4 

1=1 jr = l y 1=1 j=l J 

= H{A) + c8{A). 

The heart of our analysis is the following inequality. Recall that Ck-i = 
(k - l)log(fc - 1). 

Theorem 7. For every A £ Sk and c < Ck-i, g c {Jk) > 5c (A). 

Theorem 7 is a consequence of a general optimization principle that we 
will prove in Section 4 and which is of independent interest. We conclude 
this section by showing how Theorem 7 implies EZ 2 = 0{(EZ) 2 ) and, thus, 
Proposition 4. 

For any A £ Bk C Sk and c < Ck-i we have 

Pa 1 



9c(Jk) ~ 9c(A) = g Ck _ x ( J k ) - gc k -! (A) + (c k -i - c) log (1 + _ 

> (Cfc-l - Cj; 



2(fc-l) 2 ' 
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where for the inequality we applied Theorem 7 with c = ct-i and used that 
PA < k so that {uL]\2 < \- Thus, for every c < ct-i and every A £ B k 

(5) ffc(^) < Sc(<4) - ^J^r • (PA - 1) • 

Setting P = (cfc_i — c)/(2(k — l) 2 ) and applying Lemma 6 with <^(-) = cS (■) 
yields EZ 2 = 0((EZ) 2 ). 

One can interpret the maximization of g c geometrically by recalling that 
the vertices of the Birkhoff polytope are the k\ permutation matrices (each 
such matrix having one non-zero element in each row and column) and «/& is 
its barycenter. By convexity, is the maximizer of the entropy over B k and 
the minimizer of the 2-norm. By the same token, the permutation matrices 
are minimizers of the entropy and maximizers of the 2-norm. The constant 
c is, thus, the control parameter determining the relative importance of each 
quantity Indeed, it is not hard to see that for sufficiently small c, g c is max- 
imized by Jfe while for sufficiently large c it is not. The pertinent question 
is when does the transition occur, i.e., what is the smallest value of c for 
which the norm gain away from J k makes up for the entropy loss. Probabilis- 
tically, this is the point where the second moment explodes (relative to the 
square of the expectation) , as the dominant contribution stops corresponding 
to uncorrelated fc-colorings, i.e., to 

The generalization from Bk to Sk is motivated by the desire to exploit the 
product structure of the polytope S k and Theorem 7 is optimal with respect 
to c, up to an additive constant. At the same time, it is easy to see that the 
maximizer of g c over B\\ is not J k already when c = u k — 1, e.g. g c (Jk) < 9c(A) 
for A = ^^j- Jj, + fEf-f- I n other words, applying the second moment method 
to balanced /c-colorings cannot possibly match the first moment upper bound. 

4. Optimization on products of simplices 

In this section we will prove an inequality which is the main step in the 
proof of Theorem 7. This will be done in a more general framework since the 
greater generality, beyond its intrinsic interest, actually leads to a simplification 
over the "brute force" argument. 

In what follows we denote by the fc-dimensional simplex {(x\, . . . , Xk) € 
[0, l] k : Yli=i x i = 1} an d °y S k ~ 1 C M fc the unit Euclidean sphere centered at 
the origin. Recall that denotes the set of all k x k (row) stochastic matrices. 
For 1 < p < k we denote by Sk(p) the set of all k x k stochastic matrices with 
2-norm v /7j, i.e., S k (p) = {A G S k ; \\A\\% = p). 
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Definition 8. For ^ < r < 1, let s*(r) be the unique vector in of the 
form (x, y, . . . , y) having 2-norm yjr. Observe that 



l + y/(k-l)(kr-l) 1 
x r = and y = y r = 



k a k-1 

Given h : [0, 1] — > M. and an integer k > 1 we define a function / : [1/k, 1] — > R 
as 

(6) /( r ) = ^ (^r) + (A; — 1) ■ /i (y r ) . 

Our main inequality provides a sharp bound for the maximum of entropy- 
like functions over stochastic matrices with a given 2-norm. In particular, 
in Section 5 we will prove Theorem 7 by applying Theorem 9 below to the 
function h(x) = — xlogx. 

Theorem 9. Fix an integer k > 1 and let h : [0, 1] — > R be a continuous 
strictly concave function, which is six times differentiable on (0,1). Assume 
that h'(0 + ) = oo, ti(l~) > -oo and > 0, < 0, < point-wise. 
Given 1 < p < k, for A G Sk(p) define 

k k 

H(A)=J2J2 h ^- 

i=l j=l 

Then, for f as in (6), 

H(A) < m ax { m ■ k H (I) + <* - ,„) . / ; < m < ^ 

To understand the origin of the right-hand side in (7), consider the follow- 
ing. Given 1 < p < k and an integer < m < k ^Zf* > let B p (m) 6 Sk(p) be the 
matrix whose first m rows are the constant 1/k vector and the remaining k — m 
rows are the vector s* ^ k(k~-^m) ) ■ D enne Q p (m) = H{B p (m)). Theorem 9 then 

asserts that H(A) < max m Q p (m), where < m < k ^Zf^ is real. 

To prove Theorem 9 we observe that if pi denotes the squared 2-norm of 
the i-th row then 

k 

(8) max H(A) = max max lh(s): s G n Jp~iS k ~ l \ , 

Aes k ( P ) ( Pl ,..., Pk )e P A k I yr J 

where = X^=iM s j)- The crucial point, reflecting the product structure 
of iSfc, is that to maximize the sum in (8) it suffices to maximize h in each row 
independently. The maximizer of each row is characterized by the following 
proposition: 
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Proposition 10. Fix an integer k > 1 and let h : [0, 1] — > R be a con- 
tinuous strictly concave function which is three times differentiable on (0,1). 
Assume that h'(0 + ) = oo, and h!" > point-wise. Fix \ < r < 1 and assume 
that s = (si, . . . , Sfc) € Afc fl (v/F,?^ 1 ) is suc/i i/mi 

h(s) = ^ Hsi) = max I ^ /i(t;); fa, . . . , t fc ) G A fc n v 7 ^" 1 I . 
i=l I i=l J 

Then, up to a permutation of the coordinates, s = s*(r) where s*(r) is as in 
Definition 8. 

Thus, if pi denotes the squared 2-norm of the i-th row of A G S k , Proposi- 
tion 10 implies that H(A) < F(p\, . . . , pk) = Yli=i f (P«)> where / is as in (6). 
Hence, to prove Theorem 9 it suffices to give an upper bound on F{p\, . . . , pk), 
where (pi,...,Pfc) G pA& n [l//c,l] fc . This is another optimization problem 
on a symmetric polytope and had / been concave it would be trivial. Unfor- 
tunately, in general, / is not concave (in particular, it is not concave when 
h(x) = — xlogx). Nevertheless, the conditions of Theorem 9 on h suffice to 
impart some properties on /: 

Lemma 11. Let h : [0, 1] — > R be six times differentiable on (0,1) such 
that h^ > 0, h^ < and h^ < point -wise. Then the function f defined 
in (6) satisfies f^ < point-wise. 

The following lemma is the last ingredient in the proof of Theorem 9 as it 
will allow us to make use of Lemma 11 to bound F. 

Lemma 12. Let ip : [0, 1] — ► R be continuous on [0,1] and three times 
differentiable on (0,1). Assume that ifj'(l~) = — oo and ip^ < point-wise. 
Fix 7 G (0, k] and let s = (si, . . . , s k ) G [0, l} k n jA k . Then 

*(s) = y~]i/j(sj) < max | mip(0) + (k - m)ip ^^—^ — ^ ; m G [0, k - 7] j . 



To prove Theorem 9 we define ip : [0, 1] — > R as t/>(x) = / + ^tt" x )- 
Lemma 11 and our assumptions on /i imply that ip satisfies the conditions 
of Lemma 12 (the assumption that h'(0 + ) = 00 implies that Y>'(1 _ ) = —00). 
Hence, applying Lemma 12 with 7 = k ^ k P ~^ yields Theorem 9, i.e., 

k(p-l 



Up — \\ 

<max<|m^(0) + (/c-m)V ( ^ _ _ ) ; me 



0,k- 



k-1 
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4.1. Proof of Proposition 10. When r = 1 there is nothing to prove, so 
assume that r < 1. We begin by observing that s« > for every i G {1, . . . , fe}. 
Indeed, for the sake of contradiction, we may assume without loss of generality 
(since r < 1) that s\ = and S2 > S3 > 0. Fix e > and set 

, v «2 - «3 + £ ~ \f(s~2 -s 3 - e) 2 + 4e(s 3 - e) 
[1(e) = and v(e) = ~[i(e) — e . 

Let v(e) = (e,S2 + fi(e),S3 + v(e), S4, . . . , s^). Our choice of fi(e) and u(e) 
ensures that for e small enough v(e) G n (^/r • S k ~ r ). Recall that, by 
assumption, h'(0) = 00 and h'(x) < 00 for x G (0, 1). When S2 > S3 it is clear 

= 00. On the other hand, when 

e=0 



that I (0)| < 00 and, thus, ^h(v(e)) 
S2 = S3 = s it is not hard to see that 
d 



= h'(0 + ) - h'(s) + sh (s) = 00. 




= 00 which contradicts the maxi- 

e=0 



Thus, in both cases, we have -^h(v(e)) 
mality of h(s). 

Since Sj > for every i (and, therefore, Sj < 1 as well), we may use 
Lagrange multipliers to deduce that there are \,[i G M such that for every 
i G {1, . . . , k}, h'(si) = Xsi + [i. Observe that if we let ip(u) = h'(u) — \u 
then ip" = h'" > 0, i.e., ip is strictly convex. It follows in particular that 
< 2. Thus, up to a permutation of the coordinates, we may assume 
that there is an integer 1 < m < k and a,b G (0, 1) such that s« = a for 
i G {1, . . . , m} and Si = b for i G {m + 1, . . . , k}. Without loss of generality 
a > b (so that in particular a > 1/k and 6 < 1/fc). Since ma + (k — m)b = 1 
and ma 2 + (k — m)b 2 = r, it follows that 



1 1 k — m,, 1N , , l l / m , 1 77 

a = i: + T:V (fcr-l) and b=---J- (kr - l) . 

fcfcvm k k \j k — m 

(The choice of the minus sign in the solution of the quadratic equation defining 
6 is correct since b < l/k.) Define a, (3 : [l,r _1 ] — > M by 



Furthermore, set (^(t) = t ■ h(a(t)) + (k — t) ■ h(/3(t)), so that h(s) = ip(m). 

The proof will be complete once we check that 99 is strictly decreasing. 
Observe that 

ta(t) + (k - t)(3(t) = 1 
ta(t) 2 + (k-t)/3(t) 2 = r . 
Differentiating these identities we find that 

a(t) + ta(t) - p(t) + (k- t)(3'(t) = 
a(t) 2 + 2ta(t)a'(t) - (5(t) 2 + 2(k - t)(3(t)f3' (t) = , 
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implying 



Hence, 



Mg-m q(«)-W) 
w 2i w 2{k-t) 



<ff(t) = h(a(t)) - h(P(t))+ta'(t)h'(a(t))+(k - t)f3' \t)h! '(/?(<)) 

<*(<)-/?(<).,„ 



■ h(a(t))-h(j3(t)) 



■[h'(a(t)) + h'(J3(t))] 



Therefore, in order to show that <p'(t) < 0, it is enough to prove that if 
< (5 < a < 1 then 

a — {3 . , 



h(a) - h(J3) - 



-[ti(a) + ti(P)} < . 



Fix P and define C : [P, 1] 
Now, 

C'(a) = 



-> R by C(a) = h(a) - h([3) - 2=2 [/»' (a) + h'(p)}. 
a -(3 (h'(a) - ti(/3) 



a — (3 



- h"{a) . 



By the Mean Value Theorem there is ft < 8 < a such that 
C'(a) = ^-[h"(0)-h"(a)]<0, 

since h!" > 0. This shows that ( is strictly decreasing. Since £(/3) = it follows 
that for a € (/3, 1], ((a) < 0, which concludes the proof of Proposition 10. □ 

4.2. Proof of Lemma 11. If we make the linear change of variable z = 
(k — l)(kx — 1) then our goal is to show that the function g : [0, (k — l) 2 ] — > E, 
given by 



1 ^ 



fW^U + T + (fc-i)Mr- 



l 



fc fc(fc-i) 



satisfies 5'" < point-wise. Differentiation gives 



8kz^ g "'(z) = ^ 



h'" ( t" + ~^r : 



1 



fc'" v 



k k ) (k- l) 2 V> fc(ife-l) 



A: 



k k 



+ 



-h" 



+ 3 



fe-1 \k k(k-\)J 
k k(k-l) 



Denote a = ^ and b = ^Ijy. Then 8kz 5 / 2 g'"(z) = i/;(a) - tp(-b), where 



2lW 



1 



+ 1 - 3t/l" T + * + 3/l' T + 1 . 



A: 
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Now 



lull n 



y/(t) = t 2 h 



k 



- + 1) - tti" - + 1 



k 



The assumptions on h!" and h"" imply that ip'(t) < for t > 0, and since 
a > b, it follows that ^(ct) < ^(6). Since 8fcz 5 /y"(z) = ip{a) - ip{-b) = 
[ip(a) — V'(^)] + [V'(^) ~~ ^)] > ^ suffices to show that for every b > 0, 
C(6) = VK^) — "^(—b) < 0. Since ((0) = 0, this will follow once we verify that 
('(b) < for b > 0. Observe now that ("'(/?) = bx(b), where 



X(6) = b 



h" 



b\+h" 



h" 



b\-h" 



Our goal is to show that < for b > 0, and since x(0) = it is enough to 
show that x'(b) < 0. But 



fc(5) fl+ftV (I- 



so that the required result follows from the fact that is strictly decreasing. 

4.3. Proof of Lemma 12. Before proving Lemma 12 we require one more 
preparatory fact. 

Lemma 13. Fix < 7 < k. Let tp : [0, 1] — > M 6e continuous on [0, 1] and 
i/iree times differentiable on (0,1). Assume that ip'(l~) = —00 and ip'" < 
point-wise. Consider the set A C R 3 defined by 

A = {(a,b,£) £ (0,1] x [0,1] x (0,fc]; 6 < a and £a+ (k- t)b = 7} . 

Dearie 5 : A -► R fry g(a,b,£) = £ip(a) + (k - £)tp(b). If (a,b,£) £ A is such 
that g(a, b, £) = max( ab t n^A 9( a > b, £) then a = j/£. 

Proof of Lemma 13. Observe that if b = or £ = k we are done. Therefore, 
assume that b > and £ < k. We claim that a < 1. Indeed, if a = 1 then 6 = 
< I, implying that for small enough e > 0, = ^1 — e, 6 + ^Z£,£^ G A. 

But ^<7(M £ ))| £= o = — ^V' / (l~)+^V ;/ (^) = o°> which contradicts the maximality 
of g(a,b,£). 

Since a G (0, 1) and £ G (0, fc) we can use Lagrange multipliers to deduce 
that there is A G R such that ^//(a) = A£, (k - £)i>'(b) = X(k - £) and 
tp(a) — ip(b) = X(a — b). Combined, these imply 



i>'(a) = i>'(b) = 



Ha) - m 

a — b 
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By the Mean Value Theorem, there exists 6 G (b, a) such that ip'(6) = ^ a ^_^^ . 
But, since if)"' < 0, if;' cannot take the same value three times, yielding the 
desired contradiction. □ 

We now turn to the proof of Lemma 12. Let s G [0, l] k PI jA^ be such 
that is maximal. If si = ■ ■ ■ = s& = 1 then we are done, so we assume 
that there exists i for which Sj < 1. Observe that in this case Si < 1 for every 
z G {1, . . . , k}. Indeed, assuming the contrary we may also assume without 
loss of generality that si = 1 and S2 < 1. For every e > consider the vector 
u{e) = (1 — e, S2 + e, S3, . . . , Sfc). For e small enough u(e) G [0, l] fe n 7Afe. But 
^^(u(e))| £=0 = oo, which contradicts the maximality of 

Without loss of generality we can further assume that s±, . . . ,s q > for 
some q < k and Si = for all i > q. Consider the function Vl/(i) = X^?=i V'(^) 
defined on [0, l] q n^fA q . Clearly, ^ is maximal at (si, . . . , s q ). Since Sj G (0, 1) 
for every i G {1, . . . , q}, we may use Lagrange multipliers to deduce that there 
is A G M such that for every i G {l,...,q}, ip'(si) = A. Since tp'" < 0, tp' is 
strictly concave. It follows in particular that the equation ip'(y) = A has at 
most two solutions, so that up to a permutation of the coordinates we may 
assume that there is an integer < I < q and < b < a < 1 such that Si = a 
for i G {1, . . . , £} and Sj = b for i G {£ + 1, . . . , q}. Now, using the notation of 
Lemma 13 we have that (a, b, i) G A so that 

tf(s) = (fc-gM0) + </(a,M) 

< (fc - g)V(0) + max |^(0) + (g - 0)V> (^) ! * ^ [0, q - 7 ] 

< max |m?/;(0) + (A; — m)ip — ^ 5 m G [0, A — 7] | . 

5. Proof of Theorem 7 

Let h(x) = — xlogx and note that h'(x) = — logx — 1, h"'(x) = 4 
h^(x) = and h^\x) = = £r, so that the conditions of Theorem 9 are 
satisfied in this particular case. By Theorem 9 it is, thus, enough to show that 
for c < Ck-i = (k — 1) log(A: — 1), 

, . m log k k — m ( kp — m \ , / 2 p 

< log k + 2c log ^1- -j- 

for every 1 < p < k and < m < k ^_f^ ■ Here / is as in (6) for h(x) = — xlogx. 



2 j 
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Inequality (9) simplifies to 
p-1 



(10) 



c log 1 + 



(fc-1) 



< 1 



m 
T 



log k - f 



kp — m 
k(k — m) 



Setting t = m/k, s = p — 1 and using the inequality log(l + a) < a, it suffices 
to demand that for every < t < 1 — j^j and < s < k — 1, 



(11) 



cs 



<(!-*) 



/ 



/ 



1 



+ 



(k- l) 2 

To prove (11) we define 77 : (0, 1 - 1/k] R by 



fc fc(l-t) 



and 77(0) = — /' (|) = |, making 77 continuous on [0,1 — 1/k]. Observe that 
(11) reduces to 

C < ; • 7] 



k 



k(l-t) 



Now, rf(y) = ^ , where ((y) = f (± + y) - f (±) - yf (± + y). Observe 
that C'(y) = ~yf" {h + y) so i by Lemma 11, ( can have at most one zero 

0, so 77 

Direct 



in (0, 1 — V) . A straightforward computation gives that £ 
achieves its global minimum on [0, 1 — 4] at y £ j 0, ^ fc 2 - > 



fc(fc-i) ' 



fc(fc-l) 
2 

1 - 



by definition, 77(0) = |. Hence 



omputation gives 77 (l - ±) = ^ ■ log fc, 77 ( i( fc 5i) ) = f=5 ■ lo g( fc ~ 1) anc b 

>y di 

(12) 
(fc-1) 



77 



fc(l - t) 



> 



(k - I) 2 
k 

(k-1) 3 



mm 



k k-1 



2' fc-2 
• log(fe - 1) > c fc _i , 



•log(fc-l), 



fe-1 



log 



fc(ife - 2, 

where (12) follows from elementary calculus. 

Remark. The above analysis shows that Theorem 7 is asymptotically 
optimal. Indeed, let A be the stochastic matrix whose first k — 1 rows are the 
constant 1/k vector and whose last row is the vector s*(r), defined in Def- 
inition 8, for r = I + fez^ ■ This matrix corresponds to m = fc — 1 and 



p = 1 + ^jj) in (10), and a direct computation shows that any c for which 
Theorem 7 holds must satisfy c < ct-i + 1. 
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6. Appendix: Proof of Lemma 6 



If (£ij) are nonnegative integers such that Yli j^ij = n -> standard Stirling 



approximations imply 



(13) 



ntin=i^ ! 



< 



k k 

mi 

1=1 J=l 



-tij/n 



■ mm 



3^/n, 



k k 



=ij=i 



-1/2' 



Since \V\ < (n + l)( k ^ , the contribution to the sum in (4) of the terms 
for which p± L > 1 + l/(4k 2 ) can, thus, be bounded by 

(14) 3V^(ra + l) (fc - 1)a [e n ^ L ) +1 ^ k M^)y <3n k2 (jfeV^))" • e"& 

= 0(n~ fe2 ) (k 2 e^) n . 

Furthermore, if L G V is such that pu L < 1 + ^p-, then for every 1 < 
i,j<k we have 

(^4) 2£ ££(^<4) 2 =<^- i£ 4^' 

x 7 s=l t=l x 7 

Therefore, for such L we must have £ij > n/(2k 2 ) for every Therefore, 
by (13), (14) we get 



nl 



(15) V- 

LeD 1L=i 1 lj=i 



exp 



n^? — -L 



n 



^•(^)"i:'-* (S - ,) 



Denote by M^(1R) the space of all A; x A; matrices over R and let F be 



the subspace of Mfc(R) consisting of all matrices X 



for which the 



sum of each row and each column is 0. The dimension of F is (k — l) 2 . 
Denote by the unit cube of Mfc(R), i.e. the set of all k x k matrices 
A = (a>ij) such that G [—1/2, 1/2] for all 1 < i, j < k. For LePwe define 
T(L) = L - | J fc + (F n 5oo), i-e., the tile F n -Boo shifted by L - f J fc . 



Lemma 14. For every L eT>, 



* " II yll 
e - — 11*11 



T(L) 
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(16) 



Proof. By the triangle inequality, we see that for any matrix X 



n 



L ~k Jk 



> -\\X\ 



n 



L — — Jfc - X 

k 



> -\\Xf 2 - k 2 

- g II 112 



n 



L — — Jfc - X 

k 



Thus, for X € T(L) we have \\Xg < 2 (^p L - l) (f ) 2 +^, since ||L - \J k f 2 = 
(^PL ~ l) (f) 2 and ||L - I J k - XWl < \. Therefore, 

>! e-^-eM^^dX 
Jt(l) 

-tin -f3n(^p L -l) wp^n \ 
: e 4„ . e V" 2r / vol (i^ Pi iJoo) . 



7t(l) 



Ala ii 



It is a theorem of Vaaler [15] that for any subspace E, vol (.E n B^) > 1 , 
concluding the proof. □ 

Thus, to bound the second sum in (15) we apply Lemma 14 to get 



Lev Le£> T 



e 2„ 



< e 4 " 



/R(*-l) 

(fe-l) 2 /2 



: e 4„ 



27rny 



where we have used the fact that the interiors of the "tiles" {T(L)} L&V are 
disjoint, that the Gaussian measure is rotationally invariant and that F is 
(k — l) 2 dimensional. 
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